feat(vertex): support embedding via :predict endpoint#4640
feat(vertex): support embedding via :predict endpoint#4640vinci7 wants to merge 1 commit intoQuantumNous:mainfrom
Conversation
Implement Vertex AI embedding by translating Gemini-format embedding requests into Vertex's :predict format and converting the response back to OpenAI format. This is a re-submission of QuantumNous#2488 with two additional fixes: 1. Routing covers OpenAI-compatible /v1/embeddings path, not only the Gemini-native :embedContent / :batchEmbedContents paths. 2. Response is converted from Vertex predict format ({"predictions":[{"embeddings":{"values":[...]}}]}) into OpenAIEmbeddingResponse so OpenAI clients can parse it. Changes: - vertex/adaptor.go: - URL builder appends :predict for any model name containing "embedding" - ConvertEmbeddingRequest delegates to gemini adaptor - DoRequest reshapes Gemini {content,parts,taskType,title,outputDimensionality} into Vertex {instances:[{content,task_type,title}], parameters:{outputDimensionality}} - DoResponse routes embedding responses to vertexEmbeddingHandler via new isVertexEmbedding(info) helper that matches both URL path and embedding model name prefixes - vertex/relay-vertex.go: - VertexEmbeddingResponse struct - vertexEmbeddingHandler: parses predictions, converts to OpenAIEmbeddingResponse, writes back to client - isVertexEmbedding helper All JSON ops use common.Marshal/Unmarshal per Rule 1. Tested against gemini-embedding-001, text-embedding-005, and text-multilingual-embedding-002 on us-central1 and global locations. Closes-related-to: QuantumNous#2488 (auto-closed due to main force-push, never merged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughThis PR adds Vertex embedding support by introducing request/response handling for Vertex embedding models. ConvertEmbeddingRequest delegates to the Gemini adaptor, DoRequest rewrites Gemini embedding payloads into Vertex format, and DoResponse routes embeddings to a new vertexEmbeddingHandler that transforms Vertex responses to OpenAI-compatible format. ChangesVertex Embedding Support
Sequence DiagramsequenceDiagram
participant Client
participant Router
participant DoRequest as DoRequest<br/>(Payload Transform)
participant VertexAPI as Vertex API
participant DoResponse as DoResponse<br/>(Routing)
participant Handler as vertexEmbeddingHandler<br/>(Transform)
participant Return
Client->>Router: Embedding request (Gemini model)
Router->>DoRequest: Route based on GetRequestURL<br/>(embedding model detected)
DoRequest->>DoRequest: ConvertEmbeddingRequest<br/>(delegate to Gemini adaptor)
DoRequest->>DoRequest: Rewrite body to Vertex<br/>payload format
DoRequest->>VertexAPI: Forward transformed request
VertexAPI-->>DoResponse: Vertex embedding response
DoResponse->>DoResponse: isVertexEmbedding check
DoResponse->>Handler: Route to vertexEmbeddingHandler
Handler->>Handler: Unmarshal VertexEmbeddingResponse
Handler->>Handler: Transform to OpenAI format
Handler->>Handler: Aggregate token usage
Handler->>Return: Marshal & write response
Return-->>Client: OpenAI-compatible embedding response
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
relay/channel/vertex/adaptor.go (1)
329-329: 💤 Low valueAlign embedding detection between request and response paths.
DoRequestkeys offstrings.Contains(c.Request.URL.Path, "embed")whileisVertexEmbedding(used inDoResponse) additionally accepts embedding model-name prefixes (gemini-embedding,text-embedding,text-multilingual-embedding). If a request ever lands here with one of those models but a path that doesn't containembed,DoResponsewill route tovertexEmbeddingHandlerwhile the body is never converted to Vertexinstances/parameters, so Vertex would reject the call. Consider extracting a sharedisVertexEmbedding(info)check and using it in both places to keep the pre/post conversion symmetric.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@relay/channel/vertex/adaptor.go` at line 329, The current DoRequest check uses strings.Contains(c.Request.URL.Path, "embed") which is not consistent with isVertexEmbedding(info) used in DoResponse and can cause asymmetric handling; change DoRequest to call the same isVertexEmbedding(info) helper (or extract one if not exported) instead of relying on c.Request.URL.Path so that when isVertexEmbedding(info) returns true (matching model names like "gemini-embedding", "text-embedding", "text-multilingual-embedding" or RequestMode == RequestModeGemini) you run the same Vertex conversion logic that produces Vertex instances/parameters before sending the request; update the branch that currently references RequestMode and path to use isVertexEmbedding(info) and ensure vertexEmbeddingHandler and any body-conversion code are invoked the same way as DoResponse expects.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@relay/channel/vertex/adaptor.go`:
- Around line 328-391: The DoRequest embedding-rewrite silently swallows
Unmarshal and Marshal errors, causing confusing upstream 400s; update the blocks
in DoRequest (function Adaptor.DoRequest) so that when
common.Unmarshal(bodyBytes, &req) returns a non-nil error you immediately return
nil and that error (or a wrapped error with context like "gemini embedding
unmarshal"), and likewise check the error from common.Marshal(vertexReq) and
return it instead of ignoring it; reference the Unmarshal locations that handle
dto.GeminiBatchEmbeddingRequest and dto.GeminiEmbeddingRequest and the Marshal
call that produces newBodyBytes to make the changes.
In `@relay/channel/vertex/relay-vertex.go`:
- Around line 65-114: vertexEmbeddingHandler currently unmarshals resp.Body even
for non-2xx upstream responses, producing empty Predictions and returning 200;
add an early status check after reading the body (or immediately after defer) to
mirror the relay pattern: if resp.StatusCode != http.StatusOK call
service.RelayErrorHandler(c.Request.Context(), resp, false) and return its
result as the *types.NewAPIError so the handler stops and propagates the proper
error; update vertexEmbeddingHandler (and ensure this behavior applies before
unmarshalling into VertexEmbeddingResponse and before constructing the
OpenAIEmbeddingResponse/usage).
---
Nitpick comments:
In `@relay/channel/vertex/adaptor.go`:
- Line 329: The current DoRequest check uses
strings.Contains(c.Request.URL.Path, "embed") which is not consistent with
isVertexEmbedding(info) used in DoResponse and can cause asymmetric handling;
change DoRequest to call the same isVertexEmbedding(info) helper (or extract one
if not exported) instead of relying on c.Request.URL.Path so that when
isVertexEmbedding(info) returns true (matching model names like
"gemini-embedding", "text-embedding", "text-multilingual-embedding" or
RequestMode == RequestModeGemini) you run the same Vertex conversion logic that
produces Vertex instances/parameters before sending the request; update the
branch that currently references RequestMode and path to use
isVertexEmbedding(info) and ensure vertexEmbeddingHandler and any
body-conversion code are invoked the same way as DoResponse expects.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 72aa674a-0a0c-4db8-b1d4-c4e9bb7b949e
📒 Files selected for processing (2)
relay/channel/vertex/adaptor.gorelay/channel/vertex/relay-vertex.go
| func (a *Adaptor) DoRequest(c *gin.Context, info *relaycommon.RelayInfo, requestBody io.Reader) (any, error) { | ||
| if a.RequestMode == RequestModeGemini && strings.Contains(c.Request.URL.Path, "embed") { | ||
| bodyBytes, err := io.ReadAll(requestBody) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| vertexReq := make(map[string]interface{}) | ||
| instances := make([]interface{}, 0) | ||
|
|
||
| if info.IsGeminiBatchEmbedding { | ||
| var req dto.GeminiBatchEmbeddingRequest | ||
| if err := common.Unmarshal(bodyBytes, &req); err == nil { | ||
| for _, r := range req.Requests { | ||
| instance := make(map[string]interface{}) | ||
| content := "" | ||
| for _, part := range r.Content.Parts { | ||
| if part.Text != "" { | ||
| content += part.Text | ||
| } | ||
| } | ||
| instance["content"] = content | ||
| if r.TaskType != "" { | ||
| instance["task_type"] = r.TaskType | ||
| } | ||
| if r.Title != "" { | ||
| instance["title"] = r.Title | ||
| } | ||
| instances = append(instances, instance) | ||
| } | ||
| } | ||
| } else { | ||
| var req dto.GeminiEmbeddingRequest | ||
| if err := common.Unmarshal(bodyBytes, &req); err == nil { | ||
| instance := make(map[string]interface{}) | ||
| content := "" | ||
| for _, part := range req.Content.Parts { | ||
| if part.Text != "" { | ||
| content += part.Text | ||
| } | ||
| } | ||
| instance["content"] = content | ||
| if req.TaskType != "" { | ||
| instance["task_type"] = req.TaskType | ||
| } | ||
| if req.Title != "" { | ||
| instance["title"] = req.Title | ||
| } | ||
| instances = append(instances, instance) | ||
|
|
||
| if req.OutputDimensionality > 0 { | ||
| vertexReq["parameters"] = map[string]interface{}{ | ||
| "outputDimensionality": req.OutputDimensionality, | ||
| } | ||
| } | ||
| } | ||
| } | ||
| vertexReq["instances"] = instances | ||
| newBodyBytes, _ := common.Marshal(vertexReq) | ||
| requestBody = bytes.NewReader(newBodyBytes) | ||
| logger.LogDebug(c, "Vertex Embedding request body: "+string(newBodyBytes)) | ||
| } | ||
| return channel.DoApiRequest(a, c, info, requestBody) | ||
| } |
There was a problem hiding this comment.
Surface body-conversion errors instead of swallowing them.
The new embedding rewrite path silently ignores both unmarshal and marshal failures:
- Lines 340 and 361:
if err := common.Unmarshal(bodyBytes, &req); err == nil { ... }— when parsing fails, control just falls through and the code sends{"instances":[]}to Vertex. The client then sees a confusing upstream400 INVALID_ARGUMENT: Should provide instances for text model predictionwhile the real parse error is gone. - Line 386:
newBodyBytes, _ := common.Marshal(vertexReq)— if marshalling fails, an empty/nilbody is forwarded silently.
Please return the error in both cases so misuse / schema drift is observable.
🛠️ Suggested fix
if info.IsGeminiBatchEmbedding {
var req dto.GeminiBatchEmbeddingRequest
- if err := common.Unmarshal(bodyBytes, &req); err == nil {
- for _, r := range req.Requests {
- instance := make(map[string]interface{})
- content := ""
- for _, part := range r.Content.Parts {
- if part.Text != "" {
- content += part.Text
- }
- }
- instance["content"] = content
- if r.TaskType != "" {
- instance["task_type"] = r.TaskType
- }
- if r.Title != "" {
- instance["title"] = r.Title
- }
- instances = append(instances, instance)
- }
- }
+ if err := common.Unmarshal(bodyBytes, &req); err != nil {
+ return nil, fmt.Errorf("failed to parse gemini batch embedding request: %w", err)
+ }
+ for _, r := range req.Requests {
+ instance := make(map[string]interface{})
+ content := ""
+ for _, part := range r.Content.Parts {
+ if part.Text != "" {
+ content += part.Text
+ }
+ }
+ instance["content"] = content
+ if r.TaskType != "" {
+ instance["task_type"] = r.TaskType
+ }
+ if r.Title != "" {
+ instance["title"] = r.Title
+ }
+ instances = append(instances, instance)
+ }
} else {
var req dto.GeminiEmbeddingRequest
- if err := common.Unmarshal(bodyBytes, &req); err == nil {
- instance := make(map[string]interface{})
- content := ""
- for _, part := range req.Content.Parts {
- if part.Text != "" {
- content += part.Text
- }
- }
- instance["content"] = content
- if req.TaskType != "" {
- instance["task_type"] = req.TaskType
- }
- if req.Title != "" {
- instance["title"] = req.Title
- }
- instances = append(instances, instance)
-
- if req.OutputDimensionality > 0 {
- vertexReq["parameters"] = map[string]interface{}{
- "outputDimensionality": req.OutputDimensionality,
- }
- }
- }
+ if err := common.Unmarshal(bodyBytes, &req); err != nil {
+ return nil, fmt.Errorf("failed to parse gemini embedding request: %w", err)
+ }
+ instance := make(map[string]interface{})
+ content := ""
+ for _, part := range req.Content.Parts {
+ if part.Text != "" {
+ content += part.Text
+ }
+ }
+ instance["content"] = content
+ if req.TaskType != "" {
+ instance["task_type"] = req.TaskType
+ }
+ if req.Title != "" {
+ instance["title"] = req.Title
+ }
+ instances = append(instances, instance)
+
+ if req.OutputDimensionality > 0 {
+ vertexReq["parameters"] = map[string]interface{}{
+ "outputDimensionality": req.OutputDimensionality,
+ }
+ }
}
vertexReq["instances"] = instances
- newBodyBytes, _ := common.Marshal(vertexReq)
+ newBodyBytes, err := common.Marshal(vertexReq)
+ if err != nil {
+ return nil, fmt.Errorf("failed to marshal vertex embedding request: %w", err)
+ }
requestBody = bytes.NewReader(newBodyBytes)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@relay/channel/vertex/adaptor.go` around lines 328 - 391, The DoRequest
embedding-rewrite silently swallows Unmarshal and Marshal errors, causing
confusing upstream 400s; update the blocks in DoRequest (function
Adaptor.DoRequest) so that when common.Unmarshal(bodyBytes, &req) returns a
non-nil error you immediately return nil and that error (or a wrapped error with
context like "gemini embedding unmarshal"), and likewise check the error from
common.Marshal(vertexReq) and return it instead of ignoring it; reference the
Unmarshal locations that handle dto.GeminiBatchEmbeddingRequest and
dto.GeminiEmbeddingRequest and the Marshal call that produces newBodyBytes to
make the changes.
| func vertexEmbeddingHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) (*dto.Usage, *types.NewAPIError) { | ||
| defer service.CloseResponseBodyGracefully(resp) | ||
|
|
||
| responseBody, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| return nil, types.NewOpenAIError(err, types.ErrorCodeBadResponseBody, http.StatusInternalServerError) | ||
| } | ||
|
|
||
| if common.DebugEnabled { | ||
| logger.LogDebug(c, "Vertex Embedding response body: "+string(responseBody)) | ||
| } | ||
|
|
||
| var vertexResponse VertexEmbeddingResponse | ||
| if err := common.Unmarshal(responseBody, &vertexResponse); err != nil { | ||
| return nil, types.NewOpenAIError(err, types.ErrorCodeBadResponseBody, http.StatusInternalServerError) | ||
| } | ||
|
|
||
| openAIResponse := dto.OpenAIEmbeddingResponse{ | ||
| Object: "list", | ||
| Data: make([]dto.OpenAIEmbeddingResponseItem, 0, len(vertexResponse.Predictions)), | ||
| Model: info.UpstreamModelName, | ||
| } | ||
|
|
||
| tokenCount := 0 | ||
| for i, prediction := range vertexResponse.Predictions { | ||
| openAIResponse.Data = append(openAIResponse.Data, dto.OpenAIEmbeddingResponseItem{ | ||
| Object: "embedding", | ||
| Embedding: prediction.Embeddings.Values, | ||
| Index: i, | ||
| }) | ||
| tokenCount += prediction.Embeddings.Statistics.TokenCount | ||
| } | ||
|
|
||
| usage := &dto.Usage{ | ||
| PromptTokens: tokenCount, | ||
| TotalTokens: tokenCount, | ||
| } | ||
| openAIResponse.Usage = *usage | ||
|
|
||
| jsonResponse, err := common.Marshal(openAIResponse) | ||
| if err != nil { | ||
| return nil, types.NewOpenAIError(err, types.ErrorCodeBadResponseBody, http.StatusInternalServerError) | ||
| } | ||
|
|
||
| c.Writer.Header().Set("Content-Type", "application/json") | ||
| c.Writer.WriteHeader(http.StatusOK) | ||
| _, _ = c.Writer.Write(jsonResponse) | ||
|
|
||
| return usage, nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether other handlers in this repo guard on resp.StatusCode and how relay dispatches DoResponse.
rg -nP -C3 '\bresp\.StatusCode\b' relay/channel/vertex relay/channel/gemini
rg -nP -C5 'DoResponse\(' relay/relay_adaptor.go relay/relay_text.go relay/relay_embedding.go 2>/dev/null
fd -t f -e go . relay | xargs rg -nP -C2 'StatusCode\s*!=\s*http\.StatusOK' | head -n 80Repository: QuantumNous/new-api
Length of output: 6210
Add status code check to vertexEmbeddingHandler.
vertexEmbeddingHandler unconditionally unmarshals resp.Body without checking resp.StatusCode. When Vertex returns an error (4xx/5xx), the body contains {"error":{...}} which silently unmarshals into a struct with empty Predictions. The client receives a 200 OK with an empty embedding list while the billing layer records 0 tokens for what was actually an upstream failure.
Add an early return on non-2xx status, mirroring the pattern used throughout the relay framework (e.g., relay/embedding_handler.go:72, relay/claude_handler.go:196):
if resp.StatusCode != http.StatusOK {
newAPIError := service.RelayErrorHandler(c.Request.Context(), resp, false)
return nil, newAPIError
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@relay/channel/vertex/relay-vertex.go` around lines 65 - 114,
vertexEmbeddingHandler currently unmarshals resp.Body even for non-2xx upstream
responses, producing empty Predictions and returning 200; add an early status
check after reading the body (or immediately after defer) to mirror the relay
pattern: if resp.StatusCode != http.StatusOK call
service.RelayErrorHandler(c.Request.Context(), resp, false) and return its
result as the *types.NewAPIError so the handler stops and propagates the proper
error; update vertexEmbeddingHandler (and ensure this behavior applies before
unmarshalling into VertexEmbeddingResponse and before constructing the
OpenAIEmbeddingResponse/usage).
Summary
Implement Vertex AI embedding support by translating Gemini-format embedding requests into Vertex's
:predictformat and converting the response back to OpenAI format. Currentlyrelay/channel/vertex/adaptor.go::ConvertEmbeddingRequestreturnsnot implemented.This is a re-submission of #2488 with two additional fixes that the original PR was missing:
/v1/embeddings, not only Gemini-native:embedContent/:batchEmbedContents. The original PR's check was nested insideinfo.RelayMode == constant.RelayModeGemini, so OpenAI-compat embedding requests fell through togemini.GeminiChatHandler.{"predictions":[{"embeddings":{"values":[...]}}]}, which OpenAI clients cannot parse. Output is now a properdto.OpenAIEmbeddingResponse.Changes
relay/channel/vertex/adaptor.go:predictwhen model name containsembeddingConvertEmbeddingRequestdelegates to the existing gemini adaptor (which already handles the OpenAI→Gemini conversion)DoRequestintercepts embedding requests and reshapes Gemini-format{content:{parts:[{text}]}, taskType, title, outputDimensionality}into Vertex-format{instances:[{content, task_type, title}], parameters:{outputDimensionality}}DoResponseroutes embedding responses through newvertexEmbeddingHandlerviaisVertexEmbedding(info)helperrelay/channel/vertex/relay-vertex.goVertexEmbeddingResponsestruct (parsespredictions[].embeddings.valuesandstatistics.token_count)vertexEmbeddingHandler— converts Vertex response →dto.OpenAIEmbeddingResponse→ writes back to clientisVertexEmbedding(info)helper — matches both URL path containingembedand embedding model name prefixes (gemini-embedding-*,text-embedding-*,text-multilingual-embedding-*)All JSON operations use
common.Marshal/common.Unmarshalper Rule 1. No changes to non-vertex code paths.Test plan
Verified on a Vertex AI channel against the following models on
us-central1andgloballocations:gemini-embedding-001→ 3072-dim vectors, OpenAI formattext-embedding-005→ 768-dim vectorstext-multilingual-embedding-002→ 768-dim vectorsSample request:
Sample response:
{ "object": "list", "data": [{"object":"embedding","index":0,"embedding":[-0.034, 0.011, ...]}], "model": "gemini-embedding-001", "usage": {"prompt_tokens": 1, "total_tokens": 1} }Related
Summary by CodeRabbit